Projekt dotyczący wykorzytstania danych historycznych w prognozowaniu cen waluty wirtualnej Bitcoin
## [1] "gganimate" "heatmaply" "viridis" "viridisLite" "plotly"
## [6] "lubridate" "tidyr" "dplyr" "readxl" "caret"
## [11] "lattice" "ggplot2" "stats" "graphics" "grDevices"
## [16] "utils" "datasets" "methods" "base"
Przy realizacji zadania wykorzystane zostaną dane dotyczące gospodarek światowych z lat 1970 - 2020. Użyte dane dotyczą cen złota, kursu walut, pakietu S&P oraz zastawu czynników gospodarczych dla danych państw w okresie. Część danych dostępnych jest dla szerszego zakresu dat, jednak główna analiza skupi się na wspomnianym okresie gdzie dostępne są wszystkie wspomniane dane.
Surowe dane wejściowe o wymiarach
## [1] 44304 55
Statystki
Zbiór zawiera dane z lat 1970 - 2020 dla 213 wskaźników dla 201 krajów.
## [1] "Afghanistan" "Albania"
## [3] "Algeria" "American Samoa"
## [5] "Andorra" "Angola"
## [7] "Antigua and Barbuda" "Argentina"
## [9] "Armenia" "Aruba"
## [11] "Australia" "Austria"
## [13] "Azerbaijan" "Bahamas, The"
## [15] "Bahrain" "Bangladesh"
## [17] "Barbados" "Belarus"
## [19] "Belgium" "Belize"
## [21] "Benin" "Bermuda"
## [23] "Bhutan" "Bolivia"
## [25] "Brazil" "British Virgin Islands"
## [27] "Bulgaria" "Burundi"
## [29] "Cambodia" "Cameroon"
## [31] "Canada" "Cayman Islands"
## [33] "Central African Republic" "Chad"
## [35] "Channel Islands" "Chile"
## [37] "China" "Colombia"
## [39] "Comoros" "Congo, Dem. Rep."
## [41] "Congo, Rep." "Costa Rica"
## [43] "Croatia" "Cuba"
## [45] "Curacao" "Cyprus"
## [47] "Czech Republic" "Denmark"
## [49] "Djibouti" "Dominica"
## [51] "Dominican Republic" "Ecuador"
## [53] "Egypt, Arab Rep." "El Salvador"
## [55] "Equatorial Guinea" "Eritrea"
## [57] "Estonia" "Eswatini"
## [59] "Ethiopia" "Faroe Islands"
## [61] "Fiji" "Finland"
## [63] "France" "French Polynesia"
## [65] "Gabon" "Gambia, The"
## [67] "Georgia" "Germany"
## [69] "Ghana" "Gibraltar"
## [71] "Greece" "Greenland"
## [73] "Grenada" "Guam"
## [75] "Guatemala" "Guinea"
## [77] "Guinea-Bissau" "Guyana"
## [79] "Haiti" "Honduras"
## [81] "Hong Kong SAR, China" "Hungary"
## [83] "Iceland" "India"
## [85] "Indonesia" "Iran, Islamic Rep."
## [87] "Iraq" "Ireland"
## [89] "Isle of Man" "Israel"
## [91] "Italy" "Jamaica"
## [93] "Japan" "Jordan"
## [95] "Kazakhstan" "Kenya"
## [97] "Kiribati" "Korea, Dem. People's Rep."
## [99] "Korea, Rep." "Kosovo"
## [101] "Kuwait" "Kyrgyz Republic"
## [103] "Lao PDR" "Latvia"
## [105] "Lebanon" "Lesotho"
## [107] "Liberia" "Libya"
## [109] "Liechtenstein" "Lithuania"
## [111] "Luxembourg" "Macao SAR, China"
## [113] "Madagascar" "Malawi"
## [115] "Malaysia" "Maldives"
## [117] "Mali" "Malta"
## [119] "Marshall Islands" "Mauritania"
## [121] "Mauritius" "Mexico"
## [123] "Micronesia, Fed. Sts." "Moldova"
## [125] "Monaco" "Mongolia"
## [127] "Montenegro" "Morocco"
## [129] "Mozambique" "Myanmar"
## [131] "Namibia" "Nepal"
## [133] "Netherlands" "New Caledonia"
## [135] "New Zealand" "Nicaragua"
## [137] "Niger" "Nigeria"
## [139] "North Macedonia" "Norway"
## [141] "Oman" "Pakistan"
## [143] "Panama" "Papua New Guinea"
## [145] "Paraguay" "Peru"
## [147] "Philippines" "Poland"
## [149] "Portugal" "Puerto Rico"
## [151] "Qatar" "Romania"
## [153] "Russian Federation" "Rwanda"
## [155] "San Marino" "Sao Tome and Principe"
## [157] "Saudi Arabia" "Senegal"
## [159] "Serbia" "Seychelles"
## [161] "Sierra Leone" "Singapore"
## [163] "Sint Maarten (Dutch part)" "Slovak Republic"
## [165] "Slovenia" "Solomon Islands"
## [167] "South Africa" "South Sudan"
## [169] "Spain" "St. Vincent and the Grenadines"
## [171] "Sudan" "Suriname"
## [173] "Sweden" "Switzerland"
## [175] "Syrian Arab Republic" "Tajikistan"
## [177] "Tanzania" "Thailand"
## [179] "Togo" "Tonga"
## [181] "Trinidad and Tobago" "Tunisia"
## [183] "Turkey" "Turks and Caicos Islands"
## [185] "Tuvalu" "Uganda"
## [187] "Ukraine" "United Arab Emirates"
## [189] "United Kingdom" "United States"
## [191] "Uruguay" "Uzbekistan"
## [193] "Vanuatu" "Venezuela, RB"
## [195] "Vietnam" "Virgin Islands (U.S.)"
## [197] "West Bank and Gaza" "Yemen, Rep."
## [199] "Zambia" "Zimbabwe"
## [201] "Bosnia and Herzegovina"
## [1] "Urban population growth (annual %)"
## [2] "Urban population (% of total population)"
## [3] "Urban population"
## [4] "Trade (% of GDP)"
## [5] "Total natural resources rents (% of GDP)"
## [6] "Total greenhouse gas emissions (kt of CO2 equivalent)"
## [7] "Taxes less subsidies on products (current US$)"
## [8] "Taxes less subsidies on products (current LCU)"
## [9] "Survival to age 65, female (% of cohort)"
## [10] "Survival to age 65, male (% of cohort)"
## [11] "Secondary education, teachers"
## [12] "Secondary education, pupils"
## [13] "School enrollment, tertiary (gross), gender parity index (GPI)"
## [14] "Rural population growth (annual %)"
## [15] "Rural population (% of total population)"
## [16] "Rural population"
## [17] "Pupil-teacher ratio, tertiary"
## [18] "Pupil-teacher ratio, secondary"
## [19] "Pupil-teacher ratio, primary"
## [20] "Pupil-teacher ratio, preprimary"
## [21] "Primary school starting age (years)"
## [22] "Population, total"
## [23] "Population, male"
## [24] "Population, male (% of total population)"
## [25] "Population, female (% of total population)"
## [26] "Population, female"
## [27] "Population in urban agglomerations of more than 1 million"
## [28] "Population in the largest city (% of urban population)"
## [29] "Population in largest city"
## [30] "Population growth (annual %)"
## [31] "Population density (people per sq. km of land area)"
## [32] "Population ages 65 and above (% of total population)"
## [33] "Population ages 15-64 (% of total population)"
## [34] "Population ages 0-14 (% of total population)"
## [35] "Number of under-five deaths"
## [36] "Nitrous oxide emissions (thousand metric tons of CO2 equivalent)"
## [37] "Nitrous oxide emissions in energy sector (% of total)"
## [38] "Net primary income (Net income from abroad) (current US$)"
## [39] "Net primary income (Net income from abroad) (current LCU)"
## [40] "Net official development assistance received (current US$)"
## [41] "Net domestic credit (current LCU)"
## [42] "Natural gas rents (% of GDP)"
## [43] "Mortality rate, infant (per 1,000 live births)"
## [44] "Methane emissions (kt of CO2 equivalent)"
## [45] "Methane emissions in energy sector (thousand metric tons of CO2 equivalent)"
## [46] "Merchandise exports to high-income economies (% of total merchandise exports)"
## [47] "Life expectancy at birth, total (years)"
## [48] "Land area (sq. km)"
## [49] "Imports of goods and services (current US$)"
## [50] "Imports of goods and services (% of GDP)"
## [51] "Gross national expenditure (% of GDP)"
## [52] "Gross national expenditure (current US$)"
## [53] "Gross domestic savings (% of GDP)"
## [54] "Gross domestic savings (current US$)"
## [55] "GDP per capita (current US$)"
## [56] "GDP (current US$)"
## [57] "Fuel exports (% of merchandise exports)"
## [58] "Fuel imports (% of merchandise imports)"
## [59] "Food exports (% of merchandise exports)"
## [60] "Food imports (% of merchandise imports)"
## [61] "Exports of goods and services (current US$)"
## [62] "CO2 emissions from solid fuel consumption (% of total)"
## [63] "CO2 emissions from solid fuel consumption (kt)"
## [64] "CO2 emissions from liquid fuel consumption (kt)"
## [65] "CO2 emissions from liquid fuel consumption (% of total)"
## [66] "CO2 emissions from gaseous fuel consumption (kt)"
## [67] "CO2 emissions from gaseous fuel consumption (% of total)"
## [68] "CO2 emissions (metric tons per capita)"
## [69] "CO2 emissions (kt)"
## [70] "Birth rate, crude (per 1,000 people)"
## [71] "Short-term debt (% of total external debt)"
## [72] "Portfolio investment, bonds (PPG + PNG) (NFL, current US$)"
## [73] "Inflation, consumer prices (annual %)"
## [74] "GNI growth (annual %)"
## [75] "GDP per capita growth (annual %)"
## [76] "GDP growth (annual %)"
## [77] "External debt stocks (% of GNI)"
## [78] "Exports of goods and services (annual % growth)"
## [79] "Consumer price index (2010 = 100)"
## [80] "CO2 emissions (kg per 2010 US$ of GDP)"
## [81] "Unemployment, total (% of total labor force) (national estimate)"
## [82] "Taxes less subsidies on products (constant LCU)"
## [83] "Short-term debt (% of total reserves)"
## [84] "Services, value added (% of GDP)"
## [85] "Manufacturing, value added (% of GDP)"
## [86] "Government expenditure on education, total (% of GDP)"
## [87] "Real interest rate (%)"
## [88] "Portfolio equity, net inflows (BoP, current US$)"
## [89] "Lending interest rate (%)"
## [90] "Electricity production from renewable sources, excluding hydroelectric (kWh)"
## [91] "Electricity production from renewable sources, excluding hydroelectric (% of total)"
## [92] "Electricity production from oil, gas and coal sources (% of total)"
## [93] "Electricity production from coal sources (% of total)"
## [94] "Electricity production from hydroelectric sources (% of total)"
## [95] "Electricity production from natural gas sources (% of total)"
## [96] "Electricity production from nuclear sources (% of total)"
## [97] "CO2 emissions from transport (% of total fuel combustion)"
## [98] "CO2 intensity (kg per kg of oil equivalent energy use)"
## [99] "CO2 emissions from residential buildings and commercial and public services (% of total fuel combustion)"
## [100] "CO2 emissions from other sectors, excluding residential buildings and commercial and public services (% of total fuel combustion)"
## [101] "CO2 emissions from manufacturing industries and construction (% of total fuel combustion)"
## [102] "CO2 emissions from electricity and heat production, total (% of total fuel combustion)"
## [103] "Transport services (% of commercial service exports)"
## [104] "Transport services (% of commercial service imports)"
## [105] "Service imports (BoP, current US$)"
## [106] "Service exports (BoP, current US$)"
## [107] "Primary income payments (BoP, current US$)"
## [108] "Primary income receipts (BoP, current US$)"
## [109] "Portfolio investment, net (BoP, current US$)"
## [110] "Net primary income (BoP, current US$)"
## [111] "Literacy rate, adult total (% of people ages 15 and above)"
## [112] "Goods exports (BoP, current US$)"
## [113] "Goods imports (BoP, current US$)"
## [114] "Individuals using the Internet (% of population)"
## [115] "Trade in services (% of GDP)"
## [116] "Net primary income (Net income from abroad) (constant LCU)"
## [117] "Gross savings (% of GDP)"
## [118] "Gross savings (current US$)"
## [119] "Short-term debt (% of exports of goods, services and primary income)"
## [120] "Deposit interest rate (%)"
## [121] "Net acquisition of financial assets (% of GDP)"
## [122] "Income share held by highest 10%"
## [123] "Renewable internal freshwater resources per capita (cubic meters)"
## [124] "Renewable internal freshwater resources, total (billion cubic meters)"
## [125] "Average precipitation in depth (mm per year)"
## [126] "Taxes on goods and services (current LCU)"
## [127] "Taxes on income, profits and capital gains (% of revenue)"
## [128] "Taxes on income, profits and capital gains (% of total taxes)"
## [129] "Taxes on income, profits and capital gains (current LCU)"
## [130] "Taxes on international trade (% of revenue)"
## [131] "Taxes on international trade (current LCU)"
## [132] "Taxes on goods and services (% of revenue)"
## [133] "Taxes on exports (current LCU)"
## [134] "Taxes on exports (% of tax revenue)"
## [135] "Tax revenue (current LCU)"
## [136] "Tax revenue (% of GDP)"
## [137] "Interest payments (% of expense)"
## [138] "Expense (% of GDP)"
## [139] "Taxes on goods and services (% value added of industry and services)"
## [140] "Stocks traded, total value (current US$)"
## [141] "Stocks traded, total value (% of GDP)"
## [142] "Stocks traded, turnover ratio of domestic shares (%)"
## [143] "Share of youth not in education, employment or training, female (% of female youth population)"
## [144] "Share of youth not in education, employment or training, male (% of male youth population)"
## [145] "Share of youth not in education, employment or training, total (% of youth population)"
## [146] "Part time employment, total (% of total employment)"
## [147] "Trademark applications, direct nonresident"
## [148] "Trademark applications, direct resident"
## [149] "Trademark applications, total"
## [150] "Patent applications, nonresidents"
## [151] "Patent applications, residents"
## [152] "Pupil-teacher ratio, upper secondary"
## [153] "Renewable energy consumption (% of total final energy consumption)"
## [154] "Renewable electricity output (% of total electricity output)"
## [155] "PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)"
## [156] "PM2.5 air pollution, population exposed to levels exceeding WHO guideline value (% of total)"
## [157] "PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-1 value (% of total)"
## [158] "PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-2 value (% of total)"
## [159] "PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-3 value (% of total)"
## [160] "Labor force, total"
## [161] "International migrant stock (% of population)"
## [162] "Urban land area (sq. km)"
## [163] "CO2 emissions (kg per PPP $ of GDP)"
## [164] "CO2 emissions (kg per 2017 PPP $ of GDP)"
## [165] "Access to electricity (% of population)"
## [166] "Population living in slums (% of urban population)"
## [167] "S&P Global Equity Indices (annual % change)"
## [168] "Net official aid received (current US$)"
## [169] "Unemployment with advanced education (% of total labor force with advanced education)"
## [170] "Total greenhouse gas emissions (% change from 1990)"
## [171] "Self-employed, male (% of male employment) (modeled ILO estimate)"
## [172] "Self-employed, total (% of total employment) (modeled ILO estimate)"
## [173] "Self-employed, female (% of female employment) (modeled ILO estimate)"
## [174] "Nitrous oxide emissions (% change from 1990)"
## [175] "Methane emissions (% change from 1990)"
## [176] "Employment in industry (% of total employment) (modeled ILO estimate)"
## [177] "Employment in services (% of total employment) (modeled ILO estimate)"
## [178] "Employment in agriculture (% of total employment) (modeled ILO estimate)"
## [179] "Employers, total (% of total employment) (modeled ILO estimate)"
## [180] "Rail lines (total route-km)"
## [181] "Railways, goods transported (million ton-km)"
## [182] "Railways, passengers carried (million passenger-km)"
## [183] "International tourism, expenditures (current US$)"
## [184] "Research and development expenditure (% of GDP)"
## [185] "Researchers in R&D (per million people)"
## [186] "Proportion of seats held by women in national parliaments (%)"
## [187] "Trained teachers in upper secondary education (% of total teachers)"
## [188] "Trained teachers in primary education (% of total teachers)"
## [189] "Trained teachers in secondary education (% of total teachers)"
## [190] "Suicide mortality rate (per 100,000 population)"
## [191] "Suicide mortality rate, female (per 100,000 female population)"
## [192] "Suicide mortality rate, male (per 100,000 male population)"
## [193] "Scientific and technical journal articles"
## [194] "Mortality caused by road traffic injury (per 100,000 population)"
## [195] "Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)"
## [196] "ICT goods exports (% of total goods exports)"
## [197] "Current health expenditure per capita (current US$)"
## [198] "Current health expenditure (% of GDP)"
## [199] "Bank capital to assets ratio (%)"
## [200] "Prevalence of undernourishment (% of population)"
## [201] "Time required to enforce a contract (days)"
## [202] "Automated teller machines (ATMs) (per 100,000 adults)"
## [203] "Time required to build a warehouse (days)"
## [204] "Tax payments (number)"
## [205] "Value lost due to electrical outages (% of sales for affected firms)"
## [206] "Average number of visits or required meetings with tax officials (for affected firms)"
## [207] "Time required to get electricity (days)"
## [208] "Secure Internet servers"
## [209] "Secure Internet servers (per 1 million people)"
## [210] "Diabetes prevalence (% of population ages 20 to 79)"
## [211] "Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+)"
## [212] "Strength of legal rights index (0=weak to 12=strong)"
## [213] "Ease of doing business score (0 = lowest performance to 100 = best performance)"
Dodatkowo w surowych danych uwzględniony etykiety zbiorcze:
Low & middle income, Low income, Lower middle income, Middle income, World, Upper middle income, High income
Najpopularniejsze wskaźniki
## # A tibble: 10 x 2
## Nazwa Wystąpienia
## <chr> <int>
## 1 Population, total 10576
## 2 Population growth (annual %) 10573
## 3 Rural population (% of total population) 10548
## 4 Urban population (% of total population) 10548
## 5 Rural population 10525
## 6 Urban population 10525
## 7 Urban population growth (annual %) 10523
## 8 Land area (sq. km) 10492
## 9 Population density (people per sq. km of land area) 10469
## 10 Rural population growth (annual %) 10146
Do dalszej analizy wykorzystamy typowe wskaźniki świadczące o rozwoju gospodarki i ogólnej dynamice rozwoju populacji
## # A tibble: 8 x 5
## `Series Name` mean max min median
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Birth rate, crude (per 1,000 people) 26.4 5.69e1 5.9 e+0 2.40e1
## 2 GDP per capita (current US$) 9812. 1.91e5 2.28e+1 2.52e3
## 3 GDP per capita growth (annual %) 1.79 1.40e2 -6.50e+1 2.03e0
## 4 Life expectancy at birth, total (years) 66.1 8.54e1 1.89e+1 6.89e1
## 5 Population growth (annual %) 1.67 1.76e1 -1.10e+1 1.55e0
## 6 Population, total 124770119. 7.75e9 5.74e+3 5.26e6
## 7 Trade (% of GDP) 79.8 8.61e2 2.10e-2 6.77e1
## 8 Urban population (% of total population) 53.8 1 e2 2.84e+0 5.32e1
Dostępne dane kursów waluty Bitcoin pochodza z lat 2009 - 2021.
## Date Value
## Length:4661 Min. : 0.0
## Class :character 1st Qu.: 7.2
## Mode :character Median : 431.9
## Mean : 5141.2
## 3rd Qu.: 6499.1
## Max. :63554.4
Wykorzystany zostanie uśredniony kurs dzienny złoto/USD, dane dotyczą lat 1968 - 2021.
## Date USD..AM. USD..PM. GBP..AM.
## Length:13585 Min. : 34.77 Min. : 34.75 Min. : 14.48
## Class :character 1st Qu.: 280.50 1st Qu.: 281.50 1st Qu.: 177.71
## Mode :character Median : 383.32 Median : 383.50 Median : 234.51
## Mean : 575.20 Mean : 576.62 Mean : 370.84
## 3rd Qu.: 841.94 3rd Qu.: 851.50 3rd Qu.: 454.32
## Max. :2061.50 Max. :2067.15 Max. :1574.37
## NA's :1 NA's :143 NA's :11
## GBP..PM. EURO..AM. EURO..PM.
## Min. : 14.48 Min. : 237.3 Min. : 236.7
## 1st Qu.: 178.23 1st Qu.: 335.3 1st Qu.: 335.2
## Median : 234.96 Median : 892.6 Median : 896.1
## Mean : 371.81 Mean : 797.3 Mean : 797.2
## 3rd Qu.: 456.43 3rd Qu.:1114.1 3rd Qu.:1114.9
## Max. :1569.59 Max. :1743.8 Max. :1743.4
## NA's :154 NA's :7837 NA's :7880
Z danych indeksu S&P użyte zostaną informacje o cenie rzeczywistej. Dostępne dane dotyczą lat 1871-2021.
## Country Name Country Code Series Name Series Code
## Length:1129328 Length:1129328 Length:1129328 Length:1129328
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Year Value
## Min. :1970 Min. : -481321056610000
## 1st Qu.:1988 1st Qu.: 6
## Median :2000 Median : 41
## Mean :1998 Mean : 351647861972
## 3rd Qu.:2010 3rd Qu.: 9240
## Max. :2020 Max. :10211700000000000
Korzystając z zebranych danych można przeprowadzić analizę znaczącej ilość procesów, zarówno gospodarczych jak i społeczno-ekonomicznych.
Na poniższym wykresie zauważyć można typowe dla krajów wchodzących do grona krajów rozwiniętych zmiany w dzietności (na przykładzie Indii i Chin) w porównaniu z ustabilizowaną już sytuacją demograficzną w Niemczech.
W aktualnej analizie uwaga zostanie poświęcona głównie wskaźnikom gospodarczym oraz cenie złota. Na poniższym wykresie można zauważyć korelację kursu złota jak i cyfrowej waluty Bitcoin. Zakładać możemy że obydwa wzrosty wyceny tych zasobów mogą być powiązane ze wspólnych czynnikiem lub procesem gospodarczym. Do wstępnej analizy jednego wystarczy jednak skorelowany czynnik, jakim jest informacja o drugim zasobie.
Poniżej przedstawiona mapa korelacji kilku wybranych parametrów gospodarczych nie pozwoli co prawda na ustalenie przyczyny zaobserwowanych zmian, może natomiast dać pewien pogląd jakie aktywa i wskaźniki gospodarcze zachowują się w sposób podobny do wyżej wymienionych. Tu widzimy również w pewnym stopniu skorelowane zachowanie indeksu S&P.
Na przedstawionej mapie można również zauważyć korelację występującą pomiędzy wskaźnikami demograficznymi jakimi są liczba urodzeń oraz tępo wzrostu liczebności populacji.
W poniższym punkcie przedstawiony został przykład wykorzystania regresji liniowej do szacowania zmian czynników na bazie skorelowanych danych.
Do szacowania ceny złota wykorzystane zostały dane przedstawione w pkt. 2.
## Linear Regression
##
## 1338 samples
## 12 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 1338, 1338, 1338, 1338, 1338, 1338, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 220.3678 0.09790259 180.4742
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
Wykorzystując zaobserwowane wcześniej korelacie, dostępne dane z indeksu S&P oraz badań World Development jak i algorytmu regresji liniową udało się z pewną dozą dokładności oszacować kurs Bitcoin. Do trenowania algorytmu został wykorzystany podzbiór danych (70%), porównanie poprawności szacowania zostało przeprowadzone na nowych dla algorytmu danych (zbiór testowy).